{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0d7252bc",
   "metadata": {},
   "source": [
    "# Web Scraping with Python"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c0f27f7b",
   "metadata": {},
   "source": [
    "## 1. What is Web Scraping?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "046e674d",
   "metadata": {},
   "source": [
    "- Web Scraping is the act of extracting content and data from a website. \n",
    "\n",
    "- Websites are built using Hypertext Markup Language (HTML) codes which web scraping codes or web scrapers can download objects from. \n",
    "\n",
    "- Python is a powerful tool that allows you to use code to web scrape a website."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c5fb25b4",
   "metadata": {},
   "source": [
    "## 2. Install the Necessary Libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f539d75b",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install pandas\n",
    "!pip install bs4\n",
    "!pip install requests"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "77aa3087",
   "metadata": {},
   "source": [
    "## 3. Import the Libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "8b26d354",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from bs4 import BeautifulSoup\n",
    "import requests"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6a6a653b",
   "metadata": {},
   "source": [
    "## 4. Understand the Website"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3d2c7b70",
   "metadata": {},
   "source": [
    "The website I'll use in this tutorial is [trendyol](https://www.trendyol.com/). I want to exract laptop information such as price, brand, ratingCount in this website."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2634d5c3",
   "metadata": {},
   "source": [
    "## 5. Understand the URL"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c5f931a9",
   "metadata": {},
   "source": [
    "Understanding how to interact with the URL is important to web scraping. The url I'll use is [here](https://www.trendyol.com/sr?q=notebook&qt=notebook&st=notebook&os=1&pi=1)\n",
    "\n",
    "I'm going to extract information first 10 pages of the website I get the above URL."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e32e9829",
   "metadata": {},
   "source": [
    "## 6. Create Empty Arrays"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "0e50e6f6",
   "metadata": {},
   "outputs": [],
   "source": [
    "price = []\n",
    "brand = []\n",
    "ratingCount = []\n",
    "info = []"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8af069b2",
   "metadata": {},
   "source": [
    "## 7. Web Scraping"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d9b869d1",
   "metadata": {},
   "source": [
    "I'll use a for loop, which creates an element pgn that goes through the numbers 1 through 10. The next portion is creating a link, this is broken out into 2 parts of the URL, the last section populate after page = as we identified when researching the URL. Thus, each page will be read using the request statement requests and stored in res. \n",
    "\n",
    "Then the Beautiful Soup package will give us a way to interact with the HTML from the URL and store this in soup. Next, is a series of for loops within our initial for loop: the first aspect of it locates the CSS Selector (note: SelectorGadget used), and inside the loop returns the result as text then appends to the array. The loop runs till it goes through the first 10 pages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "9234379a",
   "metadata": {},
   "outputs": [],
   "source": [
    "for pgn in range(1,10):  \n",
    "    url = \"https://www.trendyol.com/sr?q=notebook&qt=notebook&st=notebook&os=\" + str(pgn)\n",
    "    res = requests.get(url)\n",
    "    soup = BeautifulSoup(res.text)    \n",
    "    for brand_select in soup.select(\".prdct-desc-cntnr-ttl\"):\n",
    "        brand.append(brand_select)\n",
    "    for ratingCount_select in soup.select(\".ratingCount\"):\n",
    "        ratingCount.append(ratingCount_select)\n",
    "    for info_select in soup.select(\".prdct-desc-cntnr-name\"):\n",
    "        info.append(info_select)   \n",
    "    for price_select in soup.select(\".prc-box-dscntd\"):\n",
    "        price.append(price_select)  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "71b3aca4",
   "metadata": {},
   "source": [
    "## 8. Creating the DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "bb395ce0",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(216, 4)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Brand</th>\n",
       "      <th>Rating_Count</th>\n",
       "      <th>Info</th>\n",
       "      <th>Price</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Dell</td>\n",
       "      <td>(13)</td>\n",
       "      <td>Latitude 3410 I5 10210u 14\" 8gb Win10 Pro Note...</td>\n",
       "      <td>19.330,07 TL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>ACER</td>\n",
       "      <td>(7)</td>\n",
       "      <td>Aspire3 A315-56-327t Intel Core I3 1005g1 8gb ...</td>\n",
       "      <td>5.779,55 TL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>LENOVO</td>\n",
       "      <td>(12)</td>\n",
       "      <td>Ideapad Intel Core I5-1135g7 8gb/512gb Ssd 15....</td>\n",
       "      <td>10.142,02 TL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>ASUS</td>\n",
       "      <td>(67)</td>\n",
       "      <td>X515ma-ej490 Intel Celeron N4020 4gb 256gb Ssd...</td>\n",
       "      <td>4.791,10 TL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>LENOVO</td>\n",
       "      <td>(38)</td>\n",
       "      <td>V15 82c700ldtx Athlon Gold 3150u 8gb Ram 256gb...</td>\n",
       "      <td>7.623,42 TL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Huawei</td>\n",
       "      <td>(41)</td>\n",
       "      <td>Matebook 14 AMD Ryzen 5 4600H 8GB Ram 512GB SS...</td>\n",
       "      <td>13.999 TL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>LENOVO</td>\n",
       "      <td>(60)</td>\n",
       "      <td>Yoga 9 82bg007ptx I7-1185g7 16gb Ram 1tb Ssd I...</td>\n",
       "      <td>31.999 TL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>HP</td>\n",
       "      <td>(17)</td>\n",
       "      <td>Pavilion 14-dv0012nt Core I5 1135g7 8gb 256gb ...</td>\n",
       "      <td>13.999 TL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>ASUS</td>\n",
       "      <td>(5)</td>\n",
       "      <td>Laptop X515ja-br1968t I3-1005g1 4gb Ram 256gb ...</td>\n",
       "      <td>6.798 TL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>ACER</td>\n",
       "      <td>(23)</td>\n",
       "      <td>Aspire A314-21 Amd A4-9120 4gb Ram 128gb Ssd 1...</td>\n",
       "      <td>3.900 TL</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    Brand Rating_Count                                               Info  \\\n",
       "0    Dell         (13)  Latitude 3410 I5 10210u 14\" 8gb Win10 Pro Note...   \n",
       "1    ACER          (7)  Aspire3 A315-56-327t Intel Core I3 1005g1 8gb ...   \n",
       "2  LENOVO         (12)  Ideapad Intel Core I5-1135g7 8gb/512gb Ssd 15....   \n",
       "3    ASUS         (67)  X515ma-ej490 Intel Celeron N4020 4gb 256gb Ssd...   \n",
       "4  LENOVO         (38)  V15 82c700ldtx Athlon Gold 3150u 8gb Ram 256gb...   \n",
       "5  Huawei         (41)  Matebook 14 AMD Ryzen 5 4600H 8GB Ram 512GB SS...   \n",
       "6  LENOVO         (60)  Yoga 9 82bg007ptx I7-1185g7 16gb Ram 1tb Ssd I...   \n",
       "7      HP         (17)  Pavilion 14-dv0012nt Core I5 1135g7 8gb 256gb ...   \n",
       "8    ASUS          (5)  Laptop X515ja-br1968t I3-1005g1 4gb Ram 256gb ...   \n",
       "9    ACER         (23)  Aspire A314-21 Amd A4-9120 4gb Ram 128gb Ssd 1...   \n",
       "\n",
       "          Price  \n",
       "0  19.330,07 TL  \n",
       "1   5.779,55 TL  \n",
       "2  10.142,02 TL  \n",
       "3   4.791,10 TL  \n",
       "4   7.623,42 TL  \n",
       "5     13.999 TL  \n",
       "6     31.999 TL  \n",
       "7     13.999 TL  \n",
       "8      6.798 TL  \n",
       "9      3.900 TL  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df=pd.DataFrame(columns=['Brand','Rating_Count', 'Info', 'Price'])\n",
    "df['Brand']=pd.DataFrame(brand)\n",
    "df['Rating_Count']=pd.DataFrame(ratingCount)\n",
    "df['Info']=pd.DataFrame(info)\n",
    "df['Price']=pd.DataFrame(price)\n",
    "print(df.shape)\n",
    "df.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "33fb0ac0",
   "metadata": {},
   "source": [
    "Thank for reading 😀\n",
    "\n",
    "Don't forget to follow on Tirendaz Academy [YouTube-Tr](https://youtube.com/c/tirendazakademi) | [YouTube-Eng](https://www.youtube.com/channel/UCFU9Go20p01kC64w-tmFORw) | [Twitter](https://twitter.com/TirendazAcademy) | [Medium](https://tirendazacademy.medium.com) | [GitHub](https://github.com/TirendazAcademy) | [LinkedIn](https://www.linkedin.com/in/tirendaz-academy)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
